Compositionality from Reinforcement Learning∗
نویسنده
چکیده
Compositional language use shows in creatively associating hitherto unseen meanings and forms in systematic ways. I submit that compositionality, as a key feature of human language, is no reason not to see a continuum between human speech and animal communication. Basic forms of compositional creativity presuppose surprisingly little cognitive sophistication. If changes in agents’ behavioral dispositions are susceptible to similarities between different meanings and, independently, to similarities between different forms, creative compositionality can emerge in a signaling game model with reinforcement learning. A decisive step in the evolution of language was the transition from a holophrastic term language to a compositional language [7]. A holophrastic language consists of simple expressions that are individually meaningful, but are not combined in meaningful ways. In contrast, a compositional language has structured linguistic expressions which are built up from simpler individually meaningful parts. The meaning of a complex expression is related in a systematic way to the meaning of the parts that it comprises. Human language can be used holophrastically, but is compositional. Evidence for holophrastic communication in animals has been reported for long [e.g. 14]. Animals also combine signals into sequences with novel meanings [e.g. 2, 13]. Language-trained primates even creatively produce short sequences of meaningful elements to express new meanings [e.g. 10]. A compositional language has many advantages over a non-compositional one: it can convey more with less means, is therefore less susceptible to noise, can be learned from fewer examples, and much else. But in order to understand how the transition from a holophrastic to a compositional language might have been possible, it is unsatisfactory to point to any potential evolutionary advantage of compositionality once it is there [contra 11]. The relevant question is rather by what mechanism early forms of compositionality could have arisen in the context of a holophrastic system. Many learning mechanisms are capable of linking a structured meaning space with a structured space of potential expressions, and so provide potential answers to the how?-question we are after [e.g. 5, 8]. It is good to know that it is possible for rather sophisticated agents to learn, and even generate, a compositional language. But once we know it, the key question becomes what are minimal cognitive abilities that could lead to the transition in question. Skyrms [15] addresses the puzzle in this minimalist way and suggests to see the beginning of compositional language in a signaling game model first introduced by Barrett [3]. The following paragraphs will introduce this model, together with the relevant background on signaling games. I proceed to argue that the Barrett-Skyrms model misses a key feature of compositionality, namely that it is a flexible and potentially creative ability to associate novel expressions with novel meanings. But rudimentary forms of creative compositionality do not presuppose much sophistication. Agents who perceive similarities between world states and (unrelated) similarities between signals can evolve a disposition to creatively exploit existing associations between states and signals. This can be demonstrated by a simple signaling game model using Roth-Erev reinforcement learning with two defensible amendments: (i) a spill-over mechanism that distributes accumulated rewards also on non-actualized contingencies proportional to how similar they are to the successful actual contingency [c.f. 12], and (ii) a small amount of lateral inhibition [c.f. 16]. ∗Thanks to Sven Banisch, Vincent Esche, Rüdiger Gleim, Sebastian Speitel and Elliott Wagner for discussion.
منابع مشابه
Composable Deep Reinforcement Learning for Robotic Manipulation
Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained usi...
متن کاملThe evolution of compositionality and proto-syntax in signaling games
Compositionality is a key design feature of human language: the meaning of complex expressions is, for the most part, systematically constructed from the meanings of its parts and their manner of composition. This paper demonstrates that rudimentary forms of compositional, even proto-syntactic communicative behavior can emerge, without stipulating sophisticated or purposeful agency, from a vari...
متن کاملHierarchical Linearly-Solvable Markov Decision Problems
We present a hierarchical reinforcement learning framework that formulates each task in the hierarchy as a special type of Markov decision process for which the Bellman equation is linear and has analytical solution. Problems of this type, called linearly-solvable MDPs (LMDPs) have interesting properties that can be exploited in a hierarchical setting, such as efficient learning of the optimal ...
متن کاملHierarchy through Composition with Linearly Solvable Markov Decision Processes
Hierarchical architectures are critical to the scalability of reinforcement learning methods. Current hierarchical frameworks execute actions serially, with macroactions comprising sequences of primitive actions. We propose a novel alternative to these control hierarchies based on concurrent execution of many actions in parallel. Our scheme uses the concurrent compositionality provided by the l...
متن کاملHierarchy Through Composition with Multitask LMDPs
Hierarchical architectures are critical to the scalability of reinforcement learning methods. Most current hierarchical frameworks execute actions serially, with macro-actions comprising sequences of primitive actions. We propose a novel alternative to these control hierarchies based on concurrent execution of many actions in parallel. Our scheme exploits the guaranteed concurrent compositional...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013